Imagine you are the data scientist at a respected media outlet – say the “New York Times”. For the Winter Olympics coverage, your editor-in-chief asks you to analyze some data on the history of Winter Olympics Medals by Year, Country, Event and Gender and prepare some data visualizations in which you outline the main patterns around which to base the story.
Since there is no way that all features of the data can be represented in such a memo, feel free to pick and choose some patterns that would make for a good story – outlining important patterns and presenting them in a visually pleasing way.
The full background and text of the story will be researched by a writer of the magazine – your input should be based on the data and some common sense (i.e. no need to read up on this).
Provide polished plots that are refined enough to include in the magazine with very little further manipulation (already include variable descriptions [if necessary for understanding], titles, source [e.g. “International Olympic Committee”], right color etc.) and are understandable to the average reader of the “New York Times”. The design does not need to be NYTimes-like. Just be consistent.
The main data is provided as an excel sheet, containing the following variables on all participating athletes in all olympics from 1896 to 2016 (sadly, the original source of the data no longer updates beyond that year):
ID: a unique indentifier of the entryName: name of the athleteSex: sex of the athleteAge: age of the athleteHeight: height of the athleteWeight: weight of the athleteTeam: usually the country team of the athlete, with the exception of political accomodations, e.g. the “Refugee Olympic Athletes” team.NOC: national olympic comittee abbreviation.Games: year and season of games.Year: year of gamesSeason: season of games.City: host citySport: a grouping of disciplinesEvent: the particular event / competitionMedal: the particular event / competitionFor example, an event is a competition in a sport or discipline that gives rise to a ranking. Thus Alpine Skiing is the discipline, and Alpine Skiing Women's Downhills is a particular event.
In addition, you are provided with some additional information about the countries in a separate spreadsheet, including the IOC Country Code, Population, and GDP per capita.
athletes_and_events.csv, noc_regions.csv, and gdp_pop.csv. Note, that the noc_regions.csv is the set all NOC regions, while gdp_pop.csv only contains a snapshot of the current set of countries. You have to decide what to do with some countries that competed under different designations in the past (e.g. Germany and Russia) and some defunct countries and whether and how to combine their totals. Make sure to be clear about your decisions here, so that the editor (and potentially a user of your visualizations) understands what you did.ath<-read.csv("athletes_and_events.csv")
gdp <- read.csv("gdp_pop.csv")
noc <- read.csv("noc_regions.csv")
temp_df <- merge(ath, noc, by.x = "NOC",
by.y = "NOC", all.x = TRUE, all.y = FALSE)
total_df <- merge(temp_df, noc, by.x = "NOC",
by.y = "NOC", all.x = TRUE, all.y = FALSE)
Feel free to focus on smaller set of countries (say the top 10), highlight the United States or another country of your choice, consider gender of the medal winners etc. to make the visualization interesting.
Please provide (i) one visualization showing an over time comparison and (ii) one visualization in which a total medal count (across all Winter Olympics) is used. Briefly discuss which visualization you recommend to your editor and why.
Note: Currently, the medal data contains information on each athlete competing, including for team events. For example, in 2014 Russia received 4 gold medals for their men’s win in Bobsleigh Men’s Four alone. Since this is usually not how it is done in official medal statistics, try to wrangle the data so that team events are counted as a single medal. ##### (i)
medals_count_year = total_df[complete.cases(total_df[ , 15]),] %>% filter(Season == 'Winter')%>% count(NOC, Year, sort = TRUE)
medals_count_year <- medals_count_year[order(medals_count_year$n, decreasing=TRUE),]
medals_count_year_top10 <- medals_count_year %>% filter(NOC %in% head(medals_count_year, 10)$NOC)
plot <- ggplot(medals_count_year_top10, aes(x = Year, y = n, group = NOC, color = NOC)) +
geom_line() +
labs(y="Number of Medals", x="Year") +
ggtitle("Trends Of Number of Medals Won") +
theme_bw()
plot
is.winter <- total_df[total_df$Season=='Winter',]
topcountry.game<-is.winter %>%
group_by(NOC) %>%
select(Year) %>%
unique() %>%
summarize(count=n()) %>%
arrange(desc(count),NOC) %>%
mutate(ranke=row_number())
## Adding missing grouping variables: `NOC`
topcountry.game
## # A tibble: 119 × 3
## NOC count ranke
## <chr> <int> <int>
## 1 AUT 22 1
## 2 CAN 22 2
## 3 FIN 22 3
## 4 FRA 22 4
## 5 GBR 22 5
## 6 HUN 22 6
## 7 ITA 22 7
## 8 NOR 22 8
## 9 POL 22 9
## 10 SUI 22 10
## # … with 109 more rows
#how many medals of each type the country won
has.medal <- is.winter %>% drop_na(Medal)
#gold
topcountry.gold <- has.medal%>%
filter(Medal == "Gold") %>%
group_by(NOC) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(rank = row_number())
plot_gold<-head(topcountry.gold,25) %>%
ggplot(aes(x=reorder(NOC, count), y=count,fill=NOC),fill=NOC)+
geom_bar(stat="identity")+
scale_fill_discrete("Number of Gold Medals")+
scale_fill_manual(values=c("USA" = "dodgerblue2", "CAN"= "brown1"))+
theme_bw()+
theme(legend.position="left", legend.title=element_text(size=8), legend.text=element_text(size=8))+
labs(x="Name Of Countries", y="Medals Count", title="Number Of Gold Medals Won")+
theme(plot.title=element_text(hjust=0.5, size=8))+
theme(axis.text.x=element_text(size=9),axis.text.y=element_text(size=8),axis.title.x=element_text(size=10), axis.title.y= element_text(size=10))+
guides(fill="none")+
coord_flip()
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
#silver
topcountry.silver <- has.medal%>%
filter(Medal == "Silver") %>%
group_by(NOC) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(rank = row_number())
plot_silver<-head(topcountry.silver,25) %>%
ggplot(aes(x=reorder(NOC, count), y=count,fill=NOC),fill=NOC)+
geom_bar(stat="identity")+
scale_fill_discrete("Number of Silver Medals")+
scale_fill_manual(values=c("USA" = "dodgerblue2", "CAN"= "brown1"))+
theme_bw()+
theme(legend.position="left", legend.title=element_text(size=8), legend.text=element_text(size=8))+
labs(x="Name Of Countries", y="Medals Count", title="Number Of Silver Medals Won")+
theme(plot.title=element_text(hjust=0.5, size=8))+
theme(axis.text.x=element_text(size=9),axis.text.y=element_text(size=8),axis.title.x=element_text(size=10), axis.title.y= element_text(size=10))+
guides(fill="none")+
coord_flip()
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
#bronze
topcountry.bronze <- has.medal%>%
filter(Medal == "Bronze") %>%
group_by(NOC) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(rank = row_number())
plot_bronze<-head(topcountry.bronze,25) %>%
ggplot(aes(x=reorder(NOC, count), y=count, fill=NOC))+
geom_bar(stat="identity")+
scale_fill_discrete("NOC")+
scale_fill_manual(values=c("USA" = "dodgerblue2", "CAN"= "brown1"))+
theme_bw()+
theme(legend.position="left", legend.title=element_text(size=8), legend.text=element_text(size=8))+
labs(x="Name Of Countries", y="Medals Count", title="Number Of Bronze Medals Won")+
theme(plot.title=element_text(hjust=0.5, size=8))+
theme(axis.text.x=element_text(size=9),axis.text.y=element_text(size=8),axis.title.x=element_text(size=10), axis.title.y= element_text(size=10))+
guides(fill="none")+
coord_flip()
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
plot1<-ggarrange(plot_gold, plot_silver, plot_bronze,nrow=1, ncol=3)
plot1
There are different ways to calculate “success”. Consider the following variants and choose one (and make sure your choice is clear in the visualization):
- Just consider gold medals.
- Simply add up the number of medals of different types.
- Create an index in which medals are valued differently. (gold=3, silver=2, bronze=1).
- A reasonable other way that you prefer.
Now, adjust the ranking of medal success by (a) GDP per capita and (b) population. You have now three rankings: unadjusted ranking, adjusted by GDP per capita, and adjusted by population.
ath = merge(ath, noc, by.x="NOC", by.y="NOC")
ath = merge(ath, gdp, by.x="NOC", by.y="Code")
medals_count = ath[complete.cases(ath[ , 15]),] %>% filter(Season == 'Winter')%>% count(NOC, sort = TRUE)
medals_count <- medals_count[order(medals_count$n, decreasing=TRUE),]
medal_gdp_df = merge(medals_count, gdp, by.x="NOC", by.y="Code")
medal_gdp_df <- merge(medals_count, gdp, by.x = "NOC",
by.y = "Code", all.x = TRUE, all.y = FALSE)
medal_gdp_df <- medal_gdp_df[order(medal_gdp_df$n, decreasing=TRUE),]
Visualize how these rankings differ. Try to highlight a specific pattern (e.g. “South Korea – specialization reaps benefits” or “The superpowers losing their grip”). ###### (i)# of medals adjusted by gdp
#adjusted by gdp
gdp_new <- medal_gdp_df$n/medal_gdp_df$GDP.per.Capita
gdp_new
## [1] 1.131590e-02 1.412765e-02 5.954271e-03 8.461897e-03 1.006830e-02
## [6] 9.343235e-03 6.396347e-03 3.397365e-03 2.859474e-02 6.375634e-03
## [11] 4.143009e-03 2.753965e-03 3.159265e-03 1.891696e-03 9.965515e-03
## [16] 4.159938e-03 1.939821e-03 2.150615e-03 8.684518e-04 2.841365e-04
## [21] 2.613033e-03 1.099018e-03 3.223884e-04 9.705955e-04 9.535509e-04
## [26] 5.201057e-03 NA 4.089142e-04 4.379766e-03 6.660335e-04
## [31] 8.579423e-04 9.617365e-05 3.107853e-04 7.742460e-05 1.971415e-05
## [36] NA 1.345310e-03 2.644945e-05 4.690277e-04
medal_gdp_df_1 <- cbind(medal_gdp_df,gdp_new)
medal_gdp_plot <-medal_gdp_df_1 %>%
ggplot(aes(x=reorder(NOC, gdp_new), y=gdp_new, fill=NOC))+
geom_bar(stat="identity")+
scale_fill_discrete("NOC")+
scale_fill_manual(values=c("USA" = "dodgerblue2", "CAN"= "brown1"))+
theme_bw()+
theme(legend.position="left", legend.title=element_text(size=8), legend.text=element_text(size=8))+
labs(x="Name Of Countries", y="Medals Count", title="Number Of Medals Won Adjusted By Per.Capita.GDP")+
theme(plot.title=element_text(hjust=0.5, size=15))+
theme(axis.text.x=element_text(size=9),axis.text.y=element_text(size=8),axis.title.x=element_text(size=10), axis.title.y= element_text(size=10))+
guides(fill="none")+
coord_flip()
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
medal_gdp_plot
## Warning: Removed 2 rows containing missing values (position_stack).
###### (i)# of medals adjusted by population
#adjusted by gdp
pop_new <- medal_gdp_df$n/log(medal_gdp_df$Population)
medal_gdp_df_2 <- cbind(medal_gdp_df,pop_new)
medal_gdp_plot <-medal_gdp_df_2 %>%
ggplot(aes(x=reorder(NOC, pop_new), y=pop_new, fill=NOC))+
geom_bar(stat="identity")+
scale_fill_discrete("NOC")+
scale_fill_manual(values=c("USA" = "dodgerblue2", "CAN"= "brown1"))+
theme_bw()+
theme(legend.position="left", legend.title=element_text(size=8), legend.text=element_text(size=8))+
labs(x="Name Of Countries", y="Medals Count", title="Number Of Medals Won Adjusted By Population")+
theme(plot.title=element_text(hjust=0.5, size=15))+
theme(axis.text.x=element_text(size=9),axis.text.y=element_text(size=8),axis.title.x=element_text(size=10), axis.title.y= element_text(size=10))+
guides(fill="none")+
coord_flip()
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
medal_gdp_plot
Until the 2014 Sochi Winter Olympics (our data for Winter Olympics end here), there were 19 host cities. Calculate whether the host nation had an advantage. That is calculate whether the host country did win more medals when the Winter Olympics was in their country compared to other times.
Note, that the 19 host cities are noted in the data but not the countries they are located in. This happens commonly and often Wikipedia has the kind of additional data you want for the task. To save you some time, here is a quick way to get this kind of table from Wikipedia into R:
wiki_hosts <- read_html("https://en.wikipedia.org/wiki/List_of_Olympic_Games_host_cities")
hosts <- html_table(html_nodes(wiki_hosts, "table")[[2]], fill=TRUE)[-1]
hosts %>% filter(Winter != "") %>%
select(City, Country, Year)
## # A tibble: 27 × 3
## City Country Year
## <chr> <chr> <int>
## 1 Chamonix France 1924
## 2 St. Moritz Switzerland 1928
## 3 Lake Placid United States 1932
## 4 Garmisch-Partenkirchen Germany 1936
## 5 SapporoGarmisch-Partenkirchen[d] Japan Germany 1940
## 6 Cortina d'Ampezzo Italy 1944
## 7 St. Moritz Switzerland 1948
## 8 Oslo Norway 1952
## 9 Cortina d'Ampezzo Italy 1956
## 10 Squaw Valley United States 1960
## # … with 17 more rows
Provide a visualization of the host country advantage (or absence thereof).
host_cities <- hosts %>% filter(Winter != "") %>%
select(City, Country, Year)
host_cities$Country[host_cities$Country == 'Russia[h]'] <- 'Russia'
diff <- numeric()
for(row in 1:nrow(host_cities)){
if(row != 1){
year = host_cities[[row, 'Year']]
country = host_cities[[row, 'Country']]
host_medal = ath[complete.cases(ath[ , 15]),] %>% filter(Country == country, Season == 'Winter', Year == year) %>% nrow()
previous_medal = ath[complete.cases(ath[ , 15]),] %>% filter(Country == country, Season == 'Winter', Year == year-4) %>% nrow()
print('The difference is ')
print(host_medal- previous_medal)
diff[row] = host_medal-previous_medal
} else{
diff[row] = 0
}
}
## [1] "The difference is "
## [1] 3
## [1] "The difference is "
## [1] 20
## [1] "The difference is "
## [1] -7
## [1] "The difference is "
## [1] 0
## [1] "The difference is "
## [1] 0
## [1] "The difference is "
## [1] 28
## [1] "The difference is "
## [1] 6
## [1] "The difference is "
## [1] 6
## [1] "The difference is "
## [1] 1
## [1] "The difference is "
## [1] 11
## [1] "The difference is "
## [1] 2
## [1] "The difference is "
## [1] 3
## [1] "The difference is "
## [1] 2
## [1] "The difference is "
## [1] 19
## [1] "The difference is "
## [1] 0
## [1] "The difference is "
## [1] 2
## [1] "The difference is "
## [1] 10
## [1] "The difference is "
## [1] 30
## [1] "The difference is "
## [1] 3
## [1] "The difference is "
## [1] 50
## [1] "The difference is "
## [1] 4
## [1] "The difference is "
## [1] 21
## [1] "The difference is "
## [1] 43
## [1] "The difference is "
## [1] 0
## [1] "The difference is "
## [1] 0
## [1] "The difference is "
## [1] 0
host_diff <- data.frame(country = host_cities$Country, diff)
ggplot(host_diff, aes(x=reorder(country, diff), y=diff, fill = country)) +
geom_bar(stat = "identity", width = 0.6) +
theme(axis.text.x=element_text(size=6),axis.text.y=element_text(size=8),axis.title.x=element_text(size=10), axis.title.y= element_text(size=10))+
labs(y="Medal Difference", x="Host Country") +
ggtitle("Visualization Of Host Country Advantage") +
theme_hc()
player_medals_count = ath[complete.cases(ath[ , 15]),] %>%
filter(Medal == 'Gold') %>%
filter(Season == 'Winter')%>%
count(Name, Height, Weight, Sex, sort = TRUE)
top_15_player <- head(player_medals_count, 15)
plot_top_15_player <- ggplot(top_15_player, aes(x=reorder(Name, n),y=n, fill = Name)) +
geom_bar(stat = "identity", width = 0.5) +
labs(y="Number Of Gold Medals", x="Name") +
ggtitle("Players Who Won Most Gold Medals winter Olympic") +
coord_flip()
plot_top_15_player
weight_plot <- ggplot(head(player_medals_count,100), aes(x=Sex, y=Weight,fill=Sex)) +
geom_violin() +
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Violin Chart For Weight Visualization") +
xlab("Sex")
height_plot <- ggplot(head(player_medals_count,100), aes(x=Sex, y=Height,fill=Sex)) +
geom_violin() +
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Violin Chart For Height Visualization") +
xlab("Sex")
plot2<-ggarrange(weight_plot, height_plot, nrow=1, ncol=2)
## Warning: Removed 8 rows containing non-finite values (stat_ydensity).
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
## 'Arial Narrow' not found in PostScript font database
## Warning: Removed 4 rows containing non-finite values (stat_ydensity).
plot2
Choose 2 of the plots you created above and add interactivity. One of the plots needs to be written in plotly rather than just using the ggplotly automation. Briefly describe to the editor why interactivity in these visualization is particularly helpful for a reader. ###### By using plotly for Violin Charts, we can bow view the data for a spefific position on diagram
library(plotly)
plotly.w <- ggplotly(weight_plot)
## Warning: Removed 8 rows containing non-finite values (stat_ydensity).
plotly.h <- ggplotly(height_plot)
## Warning: Removed 4 rows containing non-finite values (stat_ydensity).
plotly.w
plotly.h
plot_ly(data = top_15_player,
x = ~reorder(Name, desc(n)),
y = ~n,
color = ~Name,
type = "bar") %>%
layout(title = "Most Successful Atheletes", yaxis = list(title = "Medals"), xaxis = list(title = "Athelete Name"))
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
Prepare a selected data set and add a datatable to the output. Make sure the columns are clearly labelled. Select the appropriate options for the data table (e.g. search bar, sorting, column filters etc.). Suggest to the editor which kind of information you would like to provide in a data table in the online version of the article and why.
library(DT)
datatable(medal_gdp_df, colnames = c("Country Code", "Number of Medals","Country Name","Population","Per Capita GDP"))
The data comes in a reasonably clean Excel data set. If needed for your visualization, you can add visual drapery like flag icons, icons for sports, icons for medals etc. but your are certainly not obligated to do that.
Part of the your task will be transforming the dataset into a shape that allows you to plot what you want in ggplot2. For some plots, you will necessarily need to be selective in what to include and what to leave out.
Make sure to use at least three different types of graphs, e.g. line graphs, scatter, histograms, bar chats, dot plots, heat maps etc.
Please follow the instructions to submit your homework. The homework is due on Wednesday, February 16 at 5pm
Yes, the medal counts of the olympics have surely been analyzed before. If you do come across something, please no wholesale copying of other ideas. We are trying to practice and evaluate your abilities in using ggplot2 and data visualization not the ability to do internet searches. Also, this is an individually assigned exercise – please keep your solution to yourself.